Integration Of Syntactic And Lexical Information In A Hierarchical Dependency Grammar

نویسندگان

Cristina Barbero

Leonardo Lesmo

Vincenzo Lombardo

چکیده

In this paper, we propose to introduce syntactic classes in a lexicalized dependency formalism. Subcategories of words are organized hierarchically from a general, abstract level (syntactic categories) to a word-specific level (single lexical items). The formalism is parsimonious, and useful for processing. We also sketch a parsing model that uses the hierarchical mixed-grain representation to make predictions on the structure of the input. 1 I n t r o d u c t i o n Much recent work in linguistics and computational linguistics emphasizes the role of lexical information in syntactic representation and processing. This emphasis given to the lexicon is the result of a gradual process. The original trend in linguistics has been to individuate categories of words having related characteristics the traditional syntactic categories like verb, noun, adjective, etc. and to express the structure of a sentence in terms of constituents, or phrases, built around these categories. Subsequent considerations lead to a lexicalization of grammar. Linguistically, the constraints expressed on syntactic categories are too general to explain facts about words e.g. the relation between a verb and its nominalization, "destroy the city" and "destruction of the city" or to account uniformly for a number of phenomena across languages e.g. passivization. In parsing, the use of individual item information reduces the search space of the possible structures of a sentence. From a mathematical point of view, lexicalized grammars exhibit properties like finite ambiguity (Schabes, 1990) that are of a practical interest (especially in writing realistic grammars). Dependency grammar is naturally suitable for a lexicalization, as the binary relations representing the structure of a sentence are defined with respect to the head (that is a word). Pure lexicalized formalisms, however, have also several disadvantages. Linguistically, the abstract level provided by syntactic rules is necessary to avoid the loss of generalization which would arise if classlevel information were repeated in all lexical items. In parsing, a predictive component is required to guarantee the valid prefiz property, namely the capabifity of detecting as soon as possible whether a substring is a valid prefix for the language defined by the grammar. Knowledge of syntactic categories, which does not depend on the input, is needed for a parser to be predictive. In this paper we address the problem of the interaction between syntactic and lexical information in dependency grammar. We introduce many intermediate levels between lexical items and syntactic categories, by organizing the grammar around the notion of subcategorizetion. Intuitively, a subcategorization frame for a lexical item L is a specification of the number and type of elements that L requires in order, for ml utterance that contains L, to be well-formed. For example, within the syntactic category VERB, different verbs require different numbers of nominal dependents for a well-formed sentence. In Italian (our case study), an intransitive verb such as dormirv, "sleep", subcategorizes for only one nominal element (the subject), while a transitive verb such as baciare, "kiss", subcategorizes for two nominal elements (the subject and the object) 1. Grammatical relations such as subject and object are primitive concepts in a dependency paradigm, i.e. they directly define the structure of the sentence. Consequently, the dependency paradigm is particularly suitable to define the grammar in terms of constraints on subcategorization frames. Our proposal is to use subcategories organized in a hierarchy: the upper level of the hierarchy corresponds to the syntactic categories, the other levels correspond to subcategories that are more and more 1We include the subject relation in the subcategorization, or valency, of a verb cf. (Hudson, 1990) (Mel'cuk, 1988). In most constituency theories, on the contrary, the subject is not part of the valency of a verb.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The two be's of English

This qualitative study investigates the uses of be in Contemporary English. Based on this study, one easy claim and one more difficult claim are proposed. The easy claim is that the traditional distinction between be as a lexical verb and be as an auxiliary is faulty. In particular, 'copular-be', traditionally considered to be a lexical verb, is in fact a prototypi...

متن کامل

Deep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework

We explore the consequences of representing token segmentations as hierarchical structures (trees) for the task of Multiword Expression (MWE) recognition, in isolation or in combination with dependency parsing. We propose a novel representation of token segmentation as trees on tokens, resembling dependency trees. Given this new representation, we present and evaluate two different architecture...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Syntactic Stylometry: Using Sentence Structure for Authorship Attribution

Most approaches to statistical stylometry have concentrated on lexical features, such as relative word frequencies or type-token ratios. Syntactic features have been largely ignored. This work attempts to fill that void by introducing a technique for authorship attribution based on dependency grammar. Syntactic features are extracted from texts using a common dependency parser, and those featur...

متن کامل

Disambiguation of Super Parts of Speech ( or Supertags ) : Almost

In a lexicalized grammar formalism such as Lexicalized Tree-Adjoining Grammar (LTAG), each lexical item is associated with at least one elementary structure (supertag) that localizes syntactic and semantic dependencies. Thus a parser for a lexicalized grammar must search a large set of supertags to choose the right ones to combine for the parse of the sentence. We present techniques for disambi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Integration Of Syntactic And Lexical Information In A Hierarchical Dependency Grammar

نویسندگان

چکیده

منابع مشابه

The two be's of English

Deep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Syntactic Stylometry: Using Sentence Structure for Authorship Attribution

Disambiguation of Super Parts of Speech ( or Supertags ) : Almost

عنوان ژورنال:

اشتراک گذاری